[WIP] Add dynamic scheduling operation mode by mtreinish · Pull Request #271 · mtreinish/stestr

mtreinish · 2019-09-08T21:42:52Z

This commit adds an opt-in scheduler option for dynamic scheduling.
Instead of partitioning the test list up-front based on historical
timing data this commit lets each worker ask for the next test
dynamically. This is built using python's multiprocess module to
launch new workers instead of shelling out to call python via
subprocess.

This hopefully will provide a better worker balance since we will keep
each worker occupied until there are no more tests to be run. Instead
of trying to pack fill each work optimially up front. Additionally this
should hopefully improve the pdb story for users who use pdb with tests.
Since instead of spawning subprocesses calling python to invoke the
subunit runner and reading the subunit stream from stdout and instead
uses multiprocessing to fork workers and uses pipes to pass the subunit
streams between workers.

mtreinish · 2019-09-08T21:45:43Z

This still fails some tests but after switching away from subunit.run this seems to run reliably on python >=3.5. It also will need proper test coverage since this flow is very different from what was there before. I probably should add a dynamic job to travis and appveyor to split out running the full suite between the old way and this new approach.

aspiers

Thanks for accepting my suggestion. As per this comment this doesn't seem to work for me yet.

coveralls · 2019-10-29T22:38:07Z

Coverage decreased (-21.2%) to 46.985% when pulling e8989ca on dynamic-schedule into bbc839f on master.

This commit adds an opt-in scheduler option for dynamic scheduling. Instead of partitioning the test list up-front based on historical timing data this commit lets each worker ask for the next test dynamically. This is built using python's multiprocess module to launch new workers instead of shelling out to call python via subprocess. This hopefully will provide a better worker balance since we will keep each worker occupied until there are no more tests to be run. Instead of trying to pack fill each work optimially up front. Additionally this should hopefully improve the pdb story for users who use pdb with tests. Since instead of spawning subprocesses calling python to invoke the subunit runner and reading the subunit stream from stdout and instead uses multiprocessing to fork workers and uses pipes to pass the subunit streams between workers.

Co-Authored-By: Adam Spiers <github@adamspiers.org>

This commit fixes the failing tests by catching a couple of missing things from the update. The biggest fix was that for the --no-discover case we still use a subprocess and because of that we need to tell output.ReturnCodeToSubunit to that the input is not dynamic (and therefore a Popen object) so it can handle that properly. The other major change is that the return code tests are updated so that the stdout and stderr from the subprocess calls are always decoded in the non-subunit test cases. This was done primarily for ease of debugging, but it also enabled the removal of several decode() calls when the output is parsed.

This is a refinement on the previous commit to reduce unecessary changes to the functional tests in the test_return_codes module. Mainly always decoding the output from the subprocess for testing broken things unexpectedly when a bytes object was expected.

Conflicts: stestr/commands/run.py stestr/output.py stestr/test_processor.py stestr/tests/test_return_codes.py

I originally developed this feature when we still supported older python versions in stestr. The dynamic scheduling feature depends on functionality added in Python 3.5. Since then the WIP feature branch sat stale for years since that time we've bumped the minimum version of Python supported to 3.7 so the runtime check for older python versions is no longer needed.

codecov-commenter · 2023-07-14T19:16:09Z

Codecov Report

❌ Patch coverage is 19.80198% with 81 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.74%. Comparing base (20ec64d) to head (71e8eb0).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
stestr/scheduler.py	2.63%	37 Missing ⚠️
stestr/test_processor.py	27.90%	30 Missing and 1 partial ⚠️
stestr/output.py	30.76%	7 Missing and 2 partials ⚠️
stestr/commands/run.py	42.85%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #271      +/-   ##
==========================================
- Coverage   61.42%   59.74%   -1.68%     
==========================================
  Files          30       30              
  Lines        2613     2703      +90     
  Branches      404      421      +17     
==========================================
+ Hits         1605     1615      +10     
- Misses        889      964      +75     
- Partials      119      124       +5

Flag	Coverage Δ
unittests	`59.74% <19.80%> (-1.68%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This commit fixes an issue that occured in earlier commits on the PR around the initialization of the worker processes and the scope of the launch method. Previously if the method used to launch threads returned before all the workers accessed the queue for the first time the worker wouldn't be able to read from the queue. This race condition was caused because the Queue was locally scoped to the method and would be deleted by the main process before other workers could read it. This would specifically occur on systems using "forkserver" or "spawn" multiprocessing start methods because the child processes didn't have the queue object, while "fork" would because the process memory was copied in the child process. This commit fixes this by scoping the Queue object to the instance which means it survives as long as the test processor object does (which is typically the entire run command). As part of this change the start method used by the new dynamic scheduler is set to be fixed to "spawn" to minimize any potential interactions between stestr and the code under test. This mirrors the behavior of running in non-dynamic scheduler mode, because spawn is roughly equivalent to calling python in a subprocess.

This commit improves the documentation of the new --dynamic flag to explain how it operates and what the goal of it is. It also makes it clear the feature is experimental and is an opt-in at your own risk. Also from testing this doesn't currently work on Windows, instead of blocking the feature over a platform used by 2-3% of our users (according to https://pypistats.org/packages/stestr ) this just marks it as currently unsupported. We will have to revisit how to make this work on Windows before we stabilize the feature.

thomasgoirand · 2026-03-12T09:15:46Z

Under Python 3.13, I get:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/stestr/output.py", line 183, in __del__
    self.proc.join()
AttributeError: 'dict' object has no attribute 'join'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=6, pipe_handle=103)
                                                  ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.13/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
  File "/usr/lib/python3.13/multiprocessing/synchronize.py", line 115, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory

mtreinish mentioned this pull request Sep 8, 2019

Add a --pdb option to stestr run #14

Closed

aspiers reviewed Sep 9, 2019

View reviewed changes

Comment thread stestr/commands/run.py Outdated

aspiers reviewed Sep 9, 2019

View reviewed changes

mtreinish and others added 4 commits November 24, 2019 06:17

Update stestr/commands/run.py

59189e6

Co-Authored-By: Adam Spiers <github@adamspiers.org>

mtreinish force-pushed the dynamic-schedule branch from 8933610 to e68b506 Compare November 24, 2019 11:20

mtreinish added 5 commits August 9, 2020 08:41

Merge remote-tracking branch 'origin/master' into dynamic-schedule

e47ad6e

Conflicts: stestr/commands/run.py stestr/output.py stestr/test_processor.py stestr/tests/test_return_codes.py

Merge branch 'master' into dynamic-schedule

e8989ca

Try running with --dynamic everywhere

64d8a0b

Merge remote-tracking branch 'origin/main' into dynamic-schedule

a474f3f

Fix pep8

4fcb27a

mtreinish mentioned this pull request Aug 5, 2021

Support for clean termination on receiving SIGNIT #315

Open

mtreinish added 3 commits August 5, 2021 13:50

Merge branch 'main' into dynamic-schedule

1fa2454

Merge remote-tracking branch 'origin/main' into dynamic-schedule

6055ca8

mtreinish added 6 commits July 14, 2024 08:46

Merge branch 'main' into dynamic-schedule

f7900e7

Merge branch 'main' into dynamic-schedule

455b2eb

Merge branch 'main' into dynamic-schedule

213eff6

Fix docs and lint

f7725b0

Merge branch 'main' into dynamic-schedule

71e8eb0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add dynamic scheduling operation mode#271

[WIP] Add dynamic scheduling operation mode#271
mtreinish wants to merge 19 commits into
mainfrom
dynamic-schedule

mtreinish commented Sep 8, 2019

Uh oh!

mtreinish commented Sep 8, 2019

Uh oh!

Uh oh!

aspiers left a comment

Uh oh!

coveralls commented Oct 29, 2019 •

edited

Loading

Uh oh!

codecov-commenter commented Jul 14, 2023 •

edited by codecov Bot

Loading

Uh oh!

thomasgoirand commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mtreinish commented Sep 8, 2019

Uh oh!

mtreinish commented Sep 8, 2019

Uh oh!

Uh oh!

aspiers left a comment

Choose a reason for hiding this comment

Uh oh!

coveralls commented Oct 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jul 14, 2023 • edited by codecov Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

thomasgoirand commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coveralls commented Oct 29, 2019 •

edited

Loading

codecov-commenter commented Jul 14, 2023 •

edited by codecov Bot

Loading